Skip to main content

Dragonfly

What is Dragonfly

Dragonfly is an intelligent P2P-based image and file distribution tool. It aims to improve the efficiency and success rate of file transferring, and maximize the usage of network bandwidth, especially for the distribution of larget amounts of data, such as application distribution, cache distribution, log distribution, and image distribution. At Alibaba, every month Dragonfly is invoked two billion times and distributes 3.4PB of data. Dragonfly has become one of the most important pieces of infrastructure at Alibaba. While container technologies makes DevOps life easier most of the time, it surely brings some challenges: for example the efficiency of image distribution, especially when you have to replicate image distribution on several hosts. The goal of Dragonfly is to tackle distribution problems in cloud native scenarios. The project is comprised of three main components:

  • supernode plays the role of central scheduler and controls all distribution procedure among the peer network
  • dfget resides on each peer as an agent to download file pieces
  • and “dfdaemon” plays the role of proxy which intercepts image downloading requests from container engine to dfget

Main Features

This project is an open-source version of the Dragonfly used at Alibaba. You can find it here

Main Dragonfly Features:

  • P2P-based file distribution: By using the P2P technology for file transmission, it makes the most out of the bandwidth resources of each peer to improve downloading efficiency, and saves a lot of cross-IDC bandwidth, especially the costly cross-board bandwidth.

  • Non-invasive support to all kinds of container technologies: Dragonfly can seamlessly support various containers for distributing images.

  • Host level speed limit: In addition to rate limit for the current download task like many other downloading tools (for example wget and curl), Dragonfly also provides rate limit for the entire host.

  • Passive CDN: The CDN mechanism can avoid repetitive remote downloads.

  • Strong consistency: Dragonfly can make sure that all downloaded files are consistent even if users do not provide any check code (MD5).

  • Disk protection and highly efficient IO: Prechecking disk space, delaying synchronization, writing file blocks in the best order, isolating net-read/disk-write, and so on.

  • High performance: SuperNode is completely closed-loop, which means that it doesn't rely on any database or distributed cache, processing requests with extremely high performance.

  • Auto-isolation of Exception: Dragonfly will automatically isolate exception nodes (peer or SuperNode) to improve download stability.

  • No pressure on file source: Generally, only a few SuperNodes will download files from the source.

  • Support standard HTTP header: Support submitting authentication information through HTTP header.

  • Effective concurrency control of Registry Auth: Reduce the pressure on the Registry Auth Service.

  • Simple and easy to use: Very few configurations are needed.

    Notable Milestones

  • 7 project maintainers from 4 organizations

  • 67 contributors

  • 21 contributing organizations

  • 4.6k + GitHub stars

  • 100k + downloads in Docker Hub

  • 120% increase in commits last year

    How does it Work

    Dragonfly works slightly differently when downloading general files and downloading container images.

    ** Downloading General Files **

    The SuperNode plays the role of CDN and schedules the transfer of blocks between each peer. dfget is the P2P client, which is also called a "peer". It's mainly used to download and share block

    Downloading Container Images

    Registry is similar to the file server above. dfget proxy is also called dfdaemon, which intercepts HTTP requests from docker pull or docker push, and then decides which requests to process with dfget.

    Downloading Blocks

    Every file is divided into multiple blocks, which are transferred between peers. Each peer is a P2P client. The SuperNode will check if the corresponding file exists in the local disk. If not, the file will be downloaded into SuperNode from the file server.